-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
go/runtime/txpool: transactions in incoming messages #4681
Conversation
pro-wh
commented
Apr 22, 2022
•
edited
Loading
edited
- roothash incoming messages' data is no longer arbitrary bytes. it is now a runtime transaction
- expand txpool to allow multiple tx queues. each queue can have its own way of organizing transactions. the existing "schedule queue" is now called the main queue
- add a queue that exposes roothash incoming messages' transactions
- while we're at it, add a queue for local transactions that we want to prioritize and preserve the order of. may come in handy later
- add a mutex so that we don't recheck/republish/schedule at the same time
simplified map of executor worker sourcedigraph {
node [shape=none]
"(block from consensus)" -> HandleNewBlockLocked -> maybeStartProcessingBatchLocked
"(message with signed proposal)" -> queueBatchBlocking -> handleExternalBatchLocked -> maybeStartProcessingBatchLocked -> startProcessingBatchedLocked
startProcessingBatchedLocked -> resolveBatchLocked -> "ub.resolve"
g1 [label="goroutine"]
resolveBatchLocked -> g1 -> requestMissingTransactions -> "(import from incoming messages)"
requestMissingTransactions -> "(request through txsync)"
g2 [label="goroutine"]
startProcessingBatchedLocked -> g2
g2 -> getRtStateAndRoundResults -> "(read from consensus)"
g2 -> runtimeExecuteTxBatch -> "(entry into runtime)"
} inserted a routine to import transactions from incoming messages at the beginning of requestMissingTransactions (outside of backoff-retry) older proposal, not usedseems like this will need to start a goroutine as we have some locks held, and we'd need to go off and access consensus state. probably going to need a new state, in that case, to help coordinate the completion of "get transactions from roothash incoming messages" and "request the rest over txsync"or it might be okay to add the step in the goroutine before requestMissingTransactions. I'll check if we're guaranteed to be caught up to the block in the proposal |
d3c4537
to
ec6f4f4
Compare
updated, now a routine runs during requestMissingTransactions uses existing
|
ec6f4f4
to
01017ee
Compare
46be265
to
db4180d
Compare
updated this to repurpose |
bleh I need to change this some more, likely to make it have a separate queue in the txpool struct
some other thoughts:
|
various thoughts from today: requirements for separate incoming message embedded transactions queue
decoding discrepancies between node and runtime
|
design notes from today:
|
design notes from today:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be cool if you started drafting some of this in code form to make it easier to follow. I think that going with multiple queues is the right approach.
Also it would be great if we didn't need to break the runtime host protocol for this change. What exactly will be changing in a breaking way? Can it be solved with features instead of breaking the protocol?
ec73c49
to
eb69ec2
Compare
there are indeed a lot of places where we relied on the cached tx hash. if many of these remain, let's use a much simplified {raw: byte[], hash: hash} Transaction struct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest splitting this into two steps/PRs:
- Refactoring the transaction pool.
- Adding the RIM queue.
Also why are we going back to untagged transactions (e.g. ones without metadata) instead of having something like the current Transaction
type which follows the transaction over its lifetime in the queues? The general one could be simplified (hash and first seen time?) and then queues can wrap with their own metadata if needed?
adding RIM queue is too important to the design of the multiple queue system for us to do the refactor without it. may be able to do both and then remove the RIM queue in one PR though
not so much that we must go to untagged transactions, but we're experimenting with it because there were no strong uses of it outside of the queue noted while exploring the existing code and considering the goals. we may find some reasons to have tagged transactions during implementation |
72221aa
to
8f18c8c
Compare
changed: OfferChecked now gives singular tx. additional metadata and error return in the future probably currently pondering details of republishing. seems we currently all txs, even ones received from outside. whereas local txs we publish immediately after checktx, nonlocal ones we defer until the next republish wakeup. and then thereafter they all get republished every minute. if a republish attempt fails (although the current p2p implementation never reports any error), it'll be retried on the next wakeup |
The interval is actually updated based on the P2P min republish interval (which I believe bumps it somewhere over 2 minutes currently). Transactions received from outside don't need to be republished immediately (as they will be propagated further via gossipsub). The only reason why republish exists at all is because gossipsub is not reliable. Although since we added txsync maybe this is not really critical anymore. |
stuff from today: adding some ideas about managing republishing, to space txs out uniformly throughout the republish interval. the hypothesis is that evening out the publish bandwidth is somehow good. since we'll have two queues that want to republish, we could extract out this common interval management functionality. pragmatically the two queues will push to the republisher rather than the republisher pulling, because the general design of these usable tx queues doesn't expose when txs are added or removed. this would have been nice to have the check queue give to a usable tx queue, then the usable tx queue gives to the republish queue, then the republisher worker gets from the republish queue, eliminating the need for the check queue to republish directly when completing local txs. and the idea was for the local queue to use PushNow to insert things at the front of the republish queue, but this has the effect of reversing the transaction order. also of not really being immedate due to the uniform spacing taking giving a finite amount of time in between txs. might have to have a special code path for "publish right now but also put it at the end of the queue" after all. and the whole handling of failed publish is missing. (although the p2p system doesn't currently support failed publish) |
ah let's continue to use the seenCache for republishing 🤷 ain't nobody asked us to even out the republish bandwidth |
44447e8
to
fa09b25
Compare
actually yeah we'll definitely need to record timestamps for republishing. I just checked the default settings, and the recheck interval is 5 rounds, so ~40 sec on emerald. if we keep rechecking and saying "publish these ~2 minutes from now" but then removing them after 40 seconds (as we would in this new transactions-being-rechecked-are-removed-from-the-queue design), they'll never really be republished |
fa09b25
to
602d66a
Compare
notes from today:
|
oasis_txpool_pending_schedule_size now reports the size of the main queue (num txs) main queue and local queue are updated in:
roothash incoming queue is updated in:
|
a05f5dd
to
3b7b1db
Compare
d42912b
to
23fdef9
Compare
23fdef9
to
c072a38
Compare
thanks for reviewing! |